GO Term Enrichment
   HOME

TheInfoList



OR:

Gene Ontology (GO) term enrichment is a technique for interpreting sets of genes making use of the
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ge ...
system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene
FasR The Fas receptor, also known as Fas, FasR, apoptosis antigen 1 (APO-1 or APT), cluster of differentiation 95 (CD95) or tumor necrosis factor receptor superfamily member 6 (TNFRSF6), is a protein that in humans is encoded by the ''FAS'' gene. Fas ...
is categorized as being a
receptor Receptor may refer to: * Sensory receptor, in physiology, any structure which, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and responds to a ...
, involved in apoptosis and located on the plasma membrane. Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying
biological processes Biological processes are those processes that are vital for an organism to live, and that shape its capacities for interacting with its environment. Biological processes are made of many chemical reactions or other events that are involved in the ...
. This can be done by comparing the input gene set with each of the bins (terms) in the GO – a
statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
can be performed for each bin to see if it is enriched for the input genes. The output of the analysis is typically a ranked list of GO terms, each associated with a p-value.


Background


The Gene Ontology

The Gene Ontology (GO) provides a system for hierarchically classifying genes or gene products into terms organized in a
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
structure (or an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exi ...
). The terms are groups into three categories: molecular function (describing the molecular activity of a gene), biological process (describing the larger cellular or physiological role carried out by the gene, coordinated with other genes) and cellular component (describing the location in the
cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
where the gene product executes its function). Each gene can be described (annotated) with multiple terms. The GO is actively used to classify genes from humans,
model organisms A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the working ...
and a variety of other species. Using the GO it is possible to retrieve the set of terms used to describe any gene, or conversely, given a term, return the set of genes annotated to that term. For the latter query, the hierarchical system of the GO is employed to give complete results. For example, a query for the GO term for
nucleus Nucleus ( : nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom * Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucl ...
should return genes annotated to the term "nuclear membrane".


Interpreting high throughput data

Certain types of high-throughput experiments (e.g. RNA seq) return sets of genes that are over or under expressed. The GO can be used to functionally profile this set of genes, to determine which GO terms appear more frequently than would be expected by chance when examining the set of terms annotated to the input genes. For example, an experiment may compare gene expression in healthy cells versus
cancerous Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal bl ...
cells. Functional profiling can be used to elucidate the underlying cellular mechanisms associated with the cancerous condition. This is also called term enrichment or term overrepresentation, as we are testing whether a GO term is statistically enriched for the given set of genes.


Methods

There are a variety of methods for performing a term enrichment using GO. Methods may vary according to the type of statistical test applied, the most common being a
Fisher's exact test Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, a ...
/ hypergeometric test. Some methods make use of
Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
statistics. There is also variability in the type of correction applied for
Multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...
, the most common being
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Ol ...
. Methods also vary in their input – some take unranked gene sets, others ranked gene sets, with more sophisticated methods allowing each gene to be associated with a magnitude (e.g. expression level), avoiding arbitrary cutoffs.


Tools


MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses

PlantRegMap: GO annotation for 165 species and GO term enrichment analysisPLAZA
Workbench: GO, InterPro and MapMan enrichment analysis for different plant species. * The Gene Ontology Consortium (GOC) provides a Term Enrichment tool.
Term Enrichment

FunRich
ref name="pmid25921073"> is a Windows-based free standalone functional enrichment analysis tool. *
Blast2GO Blast2GO, first published in 2005, is a bioinformatics software tool for the automatic, high-throughput functional annotation of novel sequence data ( genes proteins). It makes use of the BLAST algorithm to identify similar sequences to then tr ...
, is a platform-independent desktop application to perform functional enrichment analysis as well as functional annotation of novel sequence data.


References

{{Reflist Bioinformatics Ontology (information science)